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Abstract 

In a series of recent works, Boyd, Diaconis, and their co-authors 
have introduced a semidefinite programming approach for computing 
the fastest mixing Markov chain on a graph of ahowed transitions, 
given a target stationary distribution. In this paper, we show that 
standard mixing-time analysis techniques — variational characteriza- 
tions, conductance, canonical paths — can be used to give simple, non- 
trivial lower and upper bounds on the fastest mixing time. To test 
the applicability of this idea, we consider several detailed examples 
including the Glauber dynamics of the Ising model — and get sharp 
bounds. 



Keywords: Rapidly mixing Markov chains, fastest mixing, semidefinite pro- 
gramming, canonical paths, conductance. 



1 Introduction 

Sampling from a complex collection of objects is a basic procedure in physics, 
statistics and computer science. A widely used technique, known as Markov 
chain Monte Carlo (MCMC), consists in designing a Markov chain on the 
set to be sampled such that the law of the chain converges to the desired 
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distribution. The chain is run long enough for a sample to be picked from a 
good approximation of the stationary distribution. The time one has to wait 
in order for this approximation to be satisfactory is known as the mixing 
time. In practice, it is crucial that this parameter be small. See e.g. |J03j for 
a survey of theoretical results on MCMC. 

One way to picture a Markov chain (MC) on a combinatorial structure is 
to think of the states as nodes and of the transitions as edges. For a chain to 
be implement able, the neighbourhood structure surrounding each node must 
be relatively simple. Under this constraint, one has to choose a set of allowed 
transitions that is most likely to produce fast convergence. This is usually 
done in a heuristic manner. 

Once a graph of transitions has been chosen, there still is room for im- 
provement. Indeed, one has some freedom in assigning transition proba- 
bilities to each edge under the requirement, however, that the stationary 
distribution be of the right form. It turns out that choosing appropriately 
those probabilities can lead to a sizable decrease in the mixing time. 

In this context, Boyd et al. |BDX-04j have recently observed that minimiz- 
ing the mixing time of an MC on a graph of transitions with a given station- 
ary distribution can be formulated as a semidefinite program (SDP), a well- 
known generalization of linear programming to matrices. See e.g. |BV03j . 
This enables the numerical computation of the fastest mixing chain on a 
graph. Boyd et al. |BDX04j have solved numerically a number of simple 
examples. 

A further benefit of this approach is that it provides a tight lower bound 
on the optimal mixing time through the dual of the SDP. In a follow-up 
paper, Boyd et al. |BDSX04j have used this bound to exhibit an analytic 
expression for the fastest chain — and prove its optimality — when the graph 
is made of a simple path under uniform distribution. 

However, a weakness of the SDP formulation is that only small graphs can 
be studied thoroughly because numerical solvers run in time polynomial in 
the size of the graph; in practice, chains have prohibitively large state spaces. 
As for the dual, it is potentially useful from a theoretical point of view even 
for complex chains, but Boyd et al. |BDX04j give no intuitive interpretation 
of it, making it difficult to apply. 

Our goal in this paper is to provide evidence that those shortcomings 
can be overcome by a simpler approach. Our claim arises from the following 
observation: one can obtain lower and upper bounds on the mixing time 
of completely specified chains by way of well-known techniques such as path 
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coupling, conductance, canonical paths etc. |J03j : formally, those bounds are 
parameterized by transition probabilities. This prompts the questions: can 
one optimize those bounds as functions of the transition probabilities, and 
how close to optimum can one get by doing so? 

1.1 Our results 

We show through general results and examples that for well-structured prob- 
lems, the above scheme can be implemented, and that it is capable of pro- 
viding nontrivial, sharp bounds. 

On the lower bound side, we use a standard extremal characterization to 
derive a general lower bound which has a simple geometrical interpretation. 
It consists in embedding the nodes of the graph into an Euclidean space 
so as to stretch the nodes as much as possible under constraints on the 
distance separating nodes connected by an edge. We show through convex 
optimization arguments that it is actually tight. The simple interpretation 
makes it much easier to apply than the dual SDP mentioned above. Our 
result is similar to a bound obtained recently by Sun et al. ISBXPO'lj in a 
different context. We also specialize the usual conductance bound to the 
context of fastest mixing. We apply those general results to several examples 
obtaining close-to-optimal lower bounds. 

On the upper bound side, it seems much harder to derive useful, general 
results. A trivial bound can be obtained by considering any chain on the 
graph, e.g. a canonical Metropolis-Hastings chain, and computing an upper 
bound on its mixing time. But as was shown by Boyd et al. [BDX04j, there 
can be a large (unbounded) gap between standard and optimal chains. In- 
stead, we show through examples that one can obtain almost tight bounds 
by studying closely standard canonical paths arguments and minimizing the 
bound over transition probabilities. Put differently, our technique consists in 
identifying bottleneck edges and increasing the flow on them. The fact that 
this scheme can work on nontrivial Markov chains is not obvious a priori, 
and this constitutes our main result in the upper bound case. Moreover, 
this technique is constructive and it allows to design a chain which might be 
close to the fastest one. Our scheme is likely to work only on well-structured 
problems but, even in that case, there is no other non-numerical approach 
known — and the numerical approach breaks down on large-scale problems. 

Our main example is the Glauber dynamics of the Ising model, a problem 
which is beyond the reach of the numerical SDP approach. In the case of the 
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tree, by a judicious choice of rates at which nodes are updated, we improve 
the mixing time by an optimal factor. 

1.2 Organization of the paper 

We begin in Section |21 with a description of the setting and approach of 
|BD}C04j ■ We introduce our main techniques in Sections 01 and EJ Sectional 
is devoted to optimal rates of the Glauber dynamics of the Ising model. 



2 Preliminaries 
2.1 Setting 

We are given an undirected graph Q = (V, S) and a probability distribution 
TT defined on the nodes of Q. We seek to sample from tt and do so by running 
a reversible Markov chain {Xt)t>o on the state space V with stationary distri- 
bution TT, i.e. if P = {P{i, j))ij^v denotes the transition matrix of {Xt)t>o, we 
must have 7i{i)P{i,j) = n{j)P{j,i), G V. We also require that the only 
transitions allowed are those given by edges of Q, i.e. P{i,j) = 0, V(z, j) ^ £. 
For convenience, we assume that all self- loops are present. 

The time to reach stationarity is governed by the second largest eigenvalue 
of P. More precisely, let n = |V| and 1 = Ai(P) > A2(P) > ■ ■ ■ > A„(P) > 
— 1 be the eigenvalues of P. We measure the speed at which stationarity is 
reached by the relaxation time T2(P) = i_x2{p) ■ |AF04j for a thorough 
discussion of other related quantities. The smaller A2(P) — and therefore 
r2(P) — is, the faster (Xt) approaches vr. Given this observation, it is natural 
to define the fastest mixing chain on [Q, vr) as the solution of the optimization 
problem 

mm A2(P) 

s.t. P(^,j) = 0, V(2,j)^£ /^X 

7r{{}P{t,j) = 7r{j)P{j,z), V(z,j) e^. 

In the remainder of this paper, we save the notation P* for a solution of — 
which might not be unique — and let A2 = A2(P*), and T2 = r2(P*). Note that 
our definition of fastest mixing differs slightly from that in |BDX04j . Here, 
we take the usual approach of ignoring the smallest eigenvalue by considering 
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the possibility of adding a constant probability to each self-loop afterwards 
in order to bound the smallest eigenvalue away from —1. 

2.2 Fastest mixing via SDP 

The main observation in |BDX04j is that (Q) is actually a semidefinite pro- 
gram (SDP). See e.g. |BV03j for background on convex and semidefinite 
programming. This observation makes possible the numerical computation 
of optimal transition matrices. Unfortunately, since the running time of SDP 
algorithms is at best polynomial in the size of the state space, this allows 
only to study small graphs — for which sampling is actually quite trivial. One 
idea put forward by Boyd et al. |BDX04j is to solve the SDP on small in- 
stances of large combinatorial problems and try and guess the structure of 
the optimal matrix from the results. This is the approach used in |BDSX04] 
to identify the optimal chain on the path. The prospect of reproducing this 
type of exact result in other cases seems limited. 

From a theoretical point of view, an interesting consequence of the SDP 
formulation is the existence of a dual which can be used to give lower bounds 
on the optimal mixing time. Let ||^||* be the sum of the singular values 
of Y. Then, in the case of the uniform stationary distribution, the dual (of 
the more general version taking into account the smallest eigenvalue) has the 
form |BDX()4j 

n 

max > z(i) 

z,Y ^ ^ ' 

s.t. z(z) + ^(j)<21^(^,j), V(z,j)G^ (2) 

Y = Y^, II>'II*<1- 

Any feasible solution of (0) provides a lower bound on the best mixing time 
achievable on (^,vr). Moreover, strong duality holds. In |BDSX04j . this 
is used to prove optimality of a conjectured fastest chain when the graph 
is a path. Note that giving an intuitive interpretation of this optimization 
problem is not straightforward. This is a potential obstacle to the devising 
of good feasible solutions. 
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3 Lower bounds 

In this section, we discuss general lower bounds on fastest mixing that can be 
derived from common techniques for completely specified chains. We apply 
our bounds to several examples. 

3.1 Variational characterization 

The standard lower bound for completely specified chains is based on a varia- 
tional characterization of the second eigenvalue of the transition matrix. See 
e.g. jAF04j . To reveal the geometric flavor of our result, we will consider a 
more general bound. Let V —> M be functions with expectation 

under tt, i.e. ^jgv^(0'^KO = ^ (where, as before, n is the number 

of nodes). For all i G V, think of '^{i) = {'ipiii), ■ ■ ■ , '^nii)) as a vector associ- 
ated to node i. Therefore, ^'(l), . . . , '^{n) is an embedding of the graph into 
M". For each / separately, we have the inequality 



where || ■ || denotes the Euclidean norm in M". To turn the r.h.s. into a 
bound on 1 — Ag, we maximize over Q. But note that, for ipi, . . . jipn fixed, 
the r.h.s. is linear in Q so this can be expressed as the linear program 



fcev (i,j)G£- 



where Q{i,j) 



7r{i)P{i, j). Summing over / we get the bound 



1 - HP) < 




(3) 



s.t. g(z,j) = o, 
Qihi) = Qij,i), 



), V(^,j)e^. 
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The dual of this hnear program is^ 



1 — A9 < min > 7r(i)z(i) 

^ ' tt (4) 

Note the similarity with Note also that we can now minimize over 

ipi, . . . ,ipn as well to get the best bound possible. Make the change of vari- 
ables w{i) = Xlfcev i E £, assume w.l.o.g. that 
Siev = 1 (one can always renormalize the \l/'s by Xliev 
and take the multiplicative inverse of the objective function. This finally 
leads to: 

Proposition 1 The optimal relaxation time on vr) is bounded from below 
by 

^2 > ,J^,^,, , ^vr(A;)||vl/(A;)|p 

s.t. <w{t)+w{j), \/{t,j)eS (5) 

Moreover, this bound is tight, i.e. we have equality above. 

Informally, we seek to embed the graph into so as to spread the nodes as 
much as possible under local constraints over the distances separating nodes 
connected by edges. The w's give some slack in choosing which edges are 
bound by stronger or weaker constraints. See the examples. This bound is 
similar to that obtained recently by |SBXDn4] in a continuous-time context. 
There, however, the r.h.s. in the inter-node distance constraint is a fixed 
weight dij (instead of w{i) + w{j)), giving rise to a quite different problem. 

Proof (of tightness): This follows from convex optimization duality. To see 
this, we go back to formulation Note that w.l.o.g., we can assume that 
Z]fcev^(^)ll^(^)P = 1- the change of variables w{i) = 11^^(011^ - z{i) 

for all i e V, change the objective to 1 — X]r=i^(0^(^) ^ ZliLi ^(0'"^(^)' 
and set Y{i,j) = \l/(i)-^\l/(j) for all i,j G V. Then, using the Gram matrix 
representation for symmetric positive semidefinite matrices (an nxn matrix 

^To obtain this particular form, one needs to consider only those Q{i,jys such that 
G £ and then only one of Q(i,i) and Q{j, i). 



7 



M is symmetric positive semidefinite if and only if there is a set xi, . . . , x„ of 
vectors in M" such that Mij = xjxj; see e.g. |JiJ85p . we get the equivalent 
bound 

n 

Xn > min > ■7r(i)w(i) 

" tr (6) 

s.t. w{i) + w{j) < 2Y{z,j), V(z, j) e£ 

E„-.v^(Ovr(j)>^(^,j) = 0, j:i=imyik,k) = i, 

where A y indicates that A is positive semidefinite. One can check that 
the dual of this convex optimization problem is equivalent to minimizing the 
second largest eigenvalue over reversible transition matrices on {Q,7t). M 

Contrary to the standard setting, the multidimensionality of the embed- 
ding seems necessary in the fastest mixing context. In particular, plugging 
the eigenvector corresponding to the second largest eigenvalue of the optimal 
matrix as ipi (with all other coordinates 0) into (0) does not necessarily give 
a tight bound because there is no guarantee that the optimal w 's will allow 
enough room for a 1-dimensional embedding to spread sufficiently. 

Remark 1 The above bound is actually very similar to that in the case of 
completely specified chains which can be reformulated as 

r,{P) > max 7r(A;)||vl/(A;)|p 

s-t. E{,,)e.ll*(^)-*(j)fQ(^,j) = i 

Here the "slack" takes the form of a fixed weighted average over inter-node 
distances. The multidimensionality turns out not to be necessary in this case. 

Remark 2 The same scheme can be applied to the log-Sobolev constant. 
In that case, one maximizes the entropy instead of the variance. See also 
\BDX04l 

Remark 3 The smallest eigenvalue has its own geometry. There, the bound 
is the same with the term — in the inter-node distance con- 

straint replaced by {{"^{i) + \l'(j)p. The formulation ^ is equivalent to a 
combination of the two geometries (smallest and second largest eigenvalues). 
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3.2 Conductance 



As an illustration of Proposition ^ we give a simple adaptation of the con- 
ductance bound to the context of fastest mixing. 

Proposition 2 Let T he the weighted vertex expansion of (Q, vr) 

_ . f 7r{6S) 
T = mm < — — --— : 5 C V 

\7r{S) A7r{S'') 

where a Ab = min{a, b} and 6S is the set of nodes i & S'^ such that there is 
a j & S with {i,j) G S. We have the following bound 

2 - 2T 

This bound is actually folklore. It is easily derived from the usual con- 
ductance bound and is often used to obtain lower bounds on completely 
specificied chains. Here we give a direct proof. 

Proof: A simple embedding of Q in M" is to map each node to one of only 
2 points xo,xi. Say the subset S* C V is mapped to xq. Then we must 
have 7r(S')xo + 7r(S''^)xi = 0. Also, since the distance between nodes inside 
S (resp. 5"^) is 0, we can set w.l.o.g. the w^s of nodes not on the boundary 
of S (resp. S'^) to 0. We assign to the points on the boundary of S (resp. 
S"^) the value Wq (resp. Wi). Since we care only about the sum wq + Wi 
and the only constraint on Wo,Wi is '7i{6S'')wq + 7i{SS)wi = 1, it is in our 
advantage to fix one of Wq, Wi to as well. Say ti^SS"^) < n^SS) w.l.o.g. 
Then || xq — Xi|p = = (7r(5S'^))~^ and Wi = 0. An easy calculation gives 
II xof = {Tr{6S')y\l - nls)f and || xif = {n{6S^)y\n{S)f. Therefore, 
7r(5)|| xof + 7r{S')\\ xif = "g^/ff^ and the result follows. ■ 



3.3 Examples 



3.3.1 Kn-Kn 



This is the graph made of two n-node complete graphs joined by an edge. 
We denote the nodes on one side of the linking edge by 1, . . . ,n and those 
on the other side by 1', . . . ,n'. The linking edge is (1, 1'). The stationary 



l/(2n) ^ 1 
n/(2n) n 

and T2 > n/2. To get something sharper, we appeal to our more general 



distribution is uniform. The vertex expansion bound gives T — ^^^gn) ~ n 
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bound. The bottleneck in this graph is intrinsically one-dimensional, so we 
take all coordinates except the first one to be 0, i.e. we consider only ipi. 
By symmetry, it is natural to map the nodes to = — ^/'i(l') = Xq and 

= = Xi, for i ^ 1, with < Xo < Xi. The main insight here 

is that we should make the distance between 1 and 1' as large as possible 
because that pushes away from all the other points at the same time (be- 
cause of the local constraints). So we take w{i) = w{i') = 0, for all i I, 
and w{l) = w{l') = n, which gives Xq = and x\ = ^^'^ \pn. Summing 

the squares leads to a lower bound asymptotic to (| + a/2)?t, > 2.914n. In 
Section 131 we give an almost matching upper bound. See also |BDPX04] for 
a similar upper bound. 

3.3.2 n-cycle and d-dimensional torus 

In constrast to our preceding example, the n-cycle gives rise naturally to a 
multidimensional embedding. We let the stationary distribution be uniform. 
By symmetry we choose all w's equal. So all pairs of consecutive nodes have 
to be embedded to points at distance (at most) a/2. Our goal of maximiz- 
ing the sum of the squared norms — and the natural symmetry — leads to 
spreading the points evenly on a circle centered around the origin (in any 
2-dimensional subspace of R*^). That is, we take all coordinates except the 
first two to be and, numbering the nodes from 1 to n in order of traversal, 
we let ip2{i)) = (-Rcos(27ri/n), Rsiia{27ii/n)), z = 1, . . . , n, for a value 

of R which remains to be determined. The distance between consecutive 
points has to be V2 so a little geometry suggests R = 2si43/n) — Thus 

2 

the lower bound is > matching the relaxation time of the symmetric 
walk. See e.g. |Al''()4j . 

One can generalize this result to the m'^-point grid on a d-dimensional 
torus by considering a 2(i-dimensional embedding. For 1 < zi, . . . , < m, 
node (zi, . . . , id) is mapped to 

(-Rcos(27rzi/m), i?sin(27rii/m), . . . , Rcos{2'7iid/m), Rsm(27iid/m)), 

with R as above. Thus, > again matching the relaxation time of the 
symmetric walk. See jAF04j . 
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3.3.3 Geometric random graphs 

In their analysis of random walks on geometric random graphs, Boyd et 
al. |BGPS04] consider, in a key step, a variant of the ci-dimensional grid of the 
previous example. Let khe a. fixed integer smaller than m. Again, our graph 
is made of the m'^ points of the d- dimensional torus (integers modulo m) 
with uniform stationary distribution. Two nodes (zi, . . . ,id) and (ji, . . . ,jd) 
are connected by an edge if ii — ji modulo m is less or equal to k for all 
1 < I < d (the points are at most k cells apart in every dimension). Because 
of the "diagonal" edges, it seems natural to collapse all nodes on a single m- 
cycle. More precisely, we map (^i, . . . ,id) to (i?cos(27rii/m), i?sin(27rii/m)). 
We take uniform w's. Because some edges connect nodes k steps apart, the 
radius (which is constrained by the fact that points connected by an edge 
are at most apart) is now R = „ . , , > ^^P^ (assume that k divides 

" i ^ 2sin(/c7r/m) — 2fc7r ^ 

n for convenience). Thus > ^^2^2 = ^{{n/Dd)'^/'^), where Dd is the degree 
of each node and n is the number of nodes. This bound matches the lower 
bound in |B(4PSn4j . There, exact expressions for the eigenvalues of tensor 
products of circulant matrices and the analysis of a linear program lead to a 
lower bound on fastest mixing on this graph. Our geometric method is much 
simpler. 

Remark 4 In the previous two examples, plugging the same embeddings into 
the completely specifed setting ^ gives tight lower bounds on the symmet- 
ric walks. More generally, the lower bound in Proposition Ql applies to any 
completely specified chain — as do all lower bounds on fastest mixing — and it 
could prove useful as an alternative to the standard variational characteriza- 
tion when the precise details of the transition matrix appear too cumbersome. 

4 Upper bounds 

It seems difficult to give general upper bounds on fastest mixing. An obvious 
technique is to pick an arbitrary chain and compute an upper bound on its 
relaxation time. For example, one might use the canonical (max-degree like) 
chain defined by the transition probabilities Pd{i,j) = 7r(j)/7r=K if (z,j) e 

5 (and otherwise) with tt* = niax{^^..^^^..|g^ 7r(j) : i G V}. Let ttq = 
miujgv 7r(«) and recall the definition of vertex expansion T from Proposition|21 
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Noting that for any subset 5* C V, 



and applying the standard Cheeger inequahty to leads to, 





A different chain would have provided a different — and possibly better — 
bound. Anyhow, this Cheeger-type bound is very unlikely to lead to useful 
results, and moreover it tells us nothing about the optimal chain. 

Instead, the goal of this section is to illustrate the computation of a 
nontrivial upper bound through a canonical paths argument. The underlying 
idea is similar to that used in the lower bound above. That is, we think of 
a standard upper bound for completely specified chains as parameterized 
by transition probabilities and attempt to minimize the bound over those 
probabilities. It turns out that because of its straightforward dependence 
on the transition matrix, the canonical paths bound appears to be the most 
manageable. In this section and the next one, we show by way of examples 
that it can actually lead to sharp results. 

4.1 Canonical paths: Kn — Kn example continued 

We consider again the Kn—Kn graph with uniform distribution. This chain is 
analyzed in details in |BDPX04j . where using sophisticated group-theoretic- 
based symmetry analysis, all eigenvalues are computed. Here, we give a 
very different, much more elementary, treatment. Also, being simpler, our 
approach has the potential of being applicable more generally. We proceed 
as follows: we write down the canonical paths upper bound as a function of 
P] we then choose P among vr-reversible chains so as to minimize the bound. 
Given a set F of paths 'jxy in G for all pairs of nodes x, y, the canonical paths 
upper bound is 



T2{P)<p{P,T), 



with 



piP, r) 



T^{x)7r{y)hxy\ 



(8) 



max 



Q{e) 



e 
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where |7a;j^| is the number of edges in '-^^y Notice that the choice of paths 
depends — crucially — only on the graph and is therefore valid for any transi- 
tion matrix consistent with (^,7r). Let W{e) be the numerator in (jHJ. On 
Kn — Kn, the natural choice of paths is to let jxy be the shortest path (in 
terms of number of edges) between x and y. Then 

Similar values hold for the other complete subgraph. The largest contribution 
to the maximum above clearly comes from W{1, 1'). In order to decrease the 
ratio in p(P, F), we need to choose a large value for Q{1,1'). But as we 
increase Q{1, 1'), the Q{l,iys and Q{l',i')^s have to be lowered accordingly. 
We do so until congestion is the same on edges (1,1'); (l)O's and (l','j')'s. 
That is, we require 

and similarly for the other side. The solution is 

'^'^■''-g'^'''' = 2n(2n-5/3) - ^' * 

0(1. n- 



2n(2n - 5/3) ■ 
We extend this to all edges by 



Q(i,j)=Q(i',j') ^ 



1- ' 



2n(2n - 5/3) 



n — 1 

The upper bound becomes 

< 3n(l - 5/(6n)). 

Recall that our lower bound was > 2.9n. Note that the standard chain 
would have consisted in choosing a neighbour uniformly at random at each 
step. The same calulation gives an upper bound of r2(?7,^) in that case. 
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Remark 5 In summary, our upper hound technique consists in two steps: 
identify transitions contributing to slow mixing by computing the congestion 
ratio in (0); then increase as much as possible the probability of transition 
on those bottleneck edges. Instead, one might try to use the same idea with 
conductance (or other upper bounds). But in that case, the fact that all 
cuts — instead of edges — have to be accounted for simultaneously makes the 
task more difficult. 

5 Optimal rates for Glauber dynamics 

In this section, we show that the framework discussed so far can be apphed 
to large, well-structured combinatorial problems where the numerical SDP 
method has little chance of being helpful. 

5.1 Glauber dynamics 

Let G = (y, E) be a finite graph^. A configuration on G is a map a -.V C, 
where C is a finite set. Typically, a is a spin or a color. We consider the 
following stationary distribution on 



where Z is a normalization constant and (f , w) is an undirected edge with 
endpoints v^w. Let S C be the subset of on which vr is nonzero. 
We wish to sample from tt by running a reversible MC on iS, but allow only 
transitions that change the state of one node at a time, i.e. the transition 
graph is ^ = (V, with V = S and (o", cr') G £^ if and only if (j{v) = cr'{v) 
for all but at most one node v & V. Let cr" be the configuration 



One such "local" MC is the so-called Glauber dynamics which, at each step, 
picks a node u of G uniformly at random and updates the value a{v) according 

^We now have two graphs. As before, caUigraphic letters are used to denote the tran- 
sition graph (see below). 




{v,'w)eE 
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to the transition probability distribution 



a'^C w:{v,w)&E 



One can check that K is vr-reversible. We actually consider a generalization 
of the Glauber dynamics by allowing the update rates to vary. More precisely, 
at each step, we pick a node v oiG with probability p{v) for some distribution 
p : y ^ [0, 1], and we update cr(f) according to K as above. The standard 
chain corresponds to uniform p. 

Predictably the question we ask is: can we compute the rates p mini- 
mizing the mixing time? Or at least can we get reasonable lower and upper 
bounds on fastest mixing in this restricted setting? We do so by following 
the methodology put forward in the previous sections. 

We first give an elementary bound on the best achievable improvement. 
This observation is essentially due to |BDX04j . 

Proposition 3 Let P* be the fastest chain on (^,vr) (not necessarily of the 
Glauber dynamics type). Also, let p* (resp. U) be the optimal (resp. uniform) 
rates for the Glauber dynamics. Denote by Pp the Glauber dynamics with 
rates p and let K = max^^y^a K~^{a, ex"). Then, 



Proof: By the variational characterization of X2{P*) and the fact that 



r2{Pu) < K\V\t2{P*), 



and 



r2{Pu) < \V\T2{Pp^). 



P''{cr,a') < 1 



I -HP*) 



inf 
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cr vi^V a&C 



< K\V\ inf 



o- v&V a&C 
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EEE(^(^)-^K))'^(^)^«) 



o- v<^V a&C 



K\V\ (l-HPu))- 
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A similar argument gives the second inequality. ■ 

Thus, assume K is 0(1), then the best improvement over Pu one can hope 
for is a factor of 0(|y|). 

We now use a canonical paths argument similar to that in Section |3] to 
obtain a general upper bound on fastest mixing for Glauber dynamics. 

Proposition 4 Let T he a set of paths ^fj^„t in Q for each pair a, a' in S. 
Assume we have a bound (depending only on v) on the ratio appearing 
in the canonical paths bound (0j for edges of the form (cr, cr") in the uniform 
rates case. Then, 

T2{Pu) < max5„, and T2{Pp) < l^r^V 5^, 

V 

with the choice of rates p{y) = B^,/ J2u^u- 

Proof: The first inequality is the canonical paths bound. For the second one, 
note that the ratio in (jSJ is multiplied by (|V^|-Bd/ ^„ -Bu)~^ when replacing 
uniform rates with p{v). We then apply the canonical paths bound to Pp 
using the bound B^ and the previous observation. Note that p is the choice 
of rates that makes all bounds on the ratio in (jSI) equal. ■ 

The point of Proposition |3] is that optimal improvement can be attained 
if most i?t,'s are small compared to max^, 5^,. We give such an example in 
the next subsection. 

5.2 Special case: the Ising model 

We apply the previous result to the case of the Ising model on a finite graph. 
Here C = { — 1, +1}, S = , and a„^((T(f ), (t{w)) = exp {Pa{v)a{w)), where 
j3 > is some constant. 

As shown in |KMF()lj . the mixing time of the Glauber dynamics on a 
graph depends on its cut-width. 

Definition 1 The cut-width ^(G) of a graph G is the smallest integer such 
that there exists a labeling vi, . . . ,v\v\ of the vertices such that for all 1 < 
A; < |\^| the number of edges from {vi, . . . , Vk} to {vk+i, ■ ■ ■ , v\v\} is at most 

To use Proposition IH we have to define the width of each node. Let / : 
V {1, . . . ,\V\} be some ordering of the nodes (not necessarily optimal). 
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then we let ^^(f) be the number of edges from {w : I{w) < I{v)} to 
{w : I{w) > I{v)}. Let A be the maximum degree of G. Then it follows 
from |KMPOlj that a bound as required in Proposition |3] is 

B, = \V\^ exp{{A^\v) + 2A)(3), 

with in particular max^, i?^, = exp {{4^{G) + 2A)j3) if / is an optimal 
ordering. 

One can try and compute in special cases. A rather uninterest- 

ing graph is the sxs grid. There, a natural ordering is to start from a corner, 
move horizontally as far as one can, then go to the next line and start over. In 
this ordering, the width of most nodes, including the maximum-width node, 
is approximately s and therefore using non-uniform rates has essentially no 
effect. 

Here is a more interesting example. Let Tr''^ = Ej.) be the complete 
rooted 6-ary tree with r levels (the root is at level and the leafs, at level 
r). Let Tij. be the number of vertices in Tr''\ 

Proposition 5 For (3 large enough, an appropriate choice of rates leads to 
the estimate 

as r tends to +oo. In constrast, the best known upper hound on the uniform 
Glauber dynamics \KMP01^ is 

Proof: A good ordering of nodes of Tr^\ say /, is given by a depth-first 
search (DFS) traversal of the tree starting from the root. This implies that 
^(tJ''^) < (6 - l)r + 1 IKMFOlj . Note that the width of a node v is the 
number of unvisited neighbours of previously visited vertices when the DFS 
search reaches v. Therefore, the width of the root is h. Then, say vertex v is 
on level 1 < Z < r and is the g-th child of its parent w (in the DFS traversal 
order). Then ^^(f) = {w) + h — q because (1) v has h children, (2) q children 
of w have now been visited, and (3) all descendants of the first q — 1 children 
of w have been visited — so these add nothing to the width. As for nodes on 
level r, we have similarly ^^{v) = C,\w) — g if f is the g-th child of w. Thus, 
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the contribution to J2v of the /-th level, 1 < Z < r, is 
5(0 = fiC-i) (e4(f-i)/3 + . . . + e^(°)^) 

with a similar expression for I = r. Summing over all levels, we get 



\V\ rir 



(1 + c(&, /9) + ■ ■ ■ + c(&, /S)'--^ + e-^'f^ciP. py) 



In the low-temperature regime, i.e. for /3 large (we actually assume e^^ ^ 1), 
this is 

= 0(r2, exp{4[(6-l)r + l]/? + 2A/3} 
whereas 

max 5^ = nl exp {{A^iTj:''^) + 2A)/3) 

= exp|4[(6- l)r + l]/3 + 2A/3}. 

Therefore, we get an optimal improvement of 0{nr) over the usual Glauber 
dynamics. ■ 

For a lower bound, we have the following result where we assume 6 = 3 
for convenience. 

Proposition 6 Assume 6 = 3 and let e = (1 + e'^^)~^. Then 

as r tends to +oo. In constrast, the best known lower bound in the uniform 
case is 



ln(2e+8e2) 



18 



Proof: Kenyon et al. |KMPOlj use recursive majority to define a cut in the 
space of configurations and apply the conductance bound. The recursive 
majority m{a) of a configuration a is computed as follows: set M{v) = a{v) 
for all V on level r; starting from level r — 1 and up, compute M on each 
node by taking the majority of the values of M at the children of that node; 
output the value of M at the root. Let S be the set of configurations a 
with m(cr) = +1. It follows from |KMPOlj that, under tt, the probability 
that a configuration is such that its recursive majority is flipped by changing 
the value at a fixed leaf is at most (2e + 8e^Y~^. The union bound and 
the {-1, +1} symmetry imply that 7r(55^) < f (2e + Se^y-^ and 7r{S) = |. 
By Proposition 121 we deduce > 1 - 2{3Y{2e + Se'^Y^^. On the other 
hand, the usual conductance bound applied to the uniform case gives that 
^Pu) > 1 - 2$5, with 

$5 = 7r{S)-' J2 ^W(^,r)<2(3r E ^(^) < (2e + Se^)- \ 

{(T,T)e£ 

where we have used that Pu{(t,t) < for neighbours a, r |KMP01j . Since 
3'' = 0{nr) our lower bound on fastest mixing is 0{nr) times smaller than 
that on the standard Glauber dynamics. ■ 

Obtaining tighter bounds would require a sharper analysis in the standard 
setting. 

Remark 6 We are not claiming that this choice of rates leads to the fastest 
sampling algorithm for this model. Indeed, in the case of the Ising model on a 
tree, a very simple propagation algorithm is much faster lEKPSOOf . Rather, 
our point is to establish that fastest mixing analysis is feasible on nontrivial 
large-scale chains — a fact that was not immediate from previous works. It 
remains to be seen whether fastest mixing ideas will find useful applications 
in sampling. 
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